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Abstract 

Prediction intervals provide a measure of the probable 
interval in which the outputs of a regression model can be 
expected to occur. Subsequently, these prediction intervals 
can be used to determine if the observed output is anomalous 
or not, conditioned on the input. In this paper, a procedure 
for determining prediction intervals for outputs of non- 
parametric regression models using bootstrap methods is 
proposed. Bootstrap methods allow for a non-parametric 
approach to computing prediction intervals with no specific 
assumptions about the sampling distribution of the noise or 
the data. The asymptotic fidelity of the proposed prediction 
intervals is theoretically proved. Subsequently, the validity 
of the bootstrap based prediction intervals is illustrated via 
simulations. Finally, the bootstrap prediction intervals are 
applied to the problem of anomaly detection on aviation 
data. 

1 Introduction 

In regression models, the estimated mean square error 
is often the only indicator of the quality of the predicted 
model. However, the mean square error estimate is in- 
sufficient to answer several questions about the model, 
including: (i) Which regions of the input space is the 
prediction quality good or poor?, and (ii) Conditioned 
on the input, which observed output values in the test 
set are anomalies? Such questions can be answered 
by specifying prediction intervals within which the pre- 
dicted variable is likely to fall at a specified confidence 
level for a given input configuration. 

1.1 Previous work In this section, we highlight 
previous work done in the areas of prediction intervals 
and anomaly detection. Subsequently, we contrast the 
contribution of this paper to the work highlighted in 
this section. 

1.1.1 Prediction intervals Prediction intervals 
have been exclusively studied for parametric linear 
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regression models [9]. In this body of work, two 
assumptions were made throughout: (i) the data was 
being generated via a linear model + noise, and (ii) 
the regression method being used was least-squares 
linear regression. Prediction intervals for linear models 
can be broadly classified into two types - (i) closed 
form expressions for empirical prediction intervals as 
a function of the data [22, 18, 19], and (ii) bootstrap 
based estimates of the prediction intervals [26]. The 
classical closed form prediction interval expressions 
were originally computed under the assumption of the 
noise being normally distributed [22], before being 
generalized [18, 19]. Stine [26] proposed prediction 
intervals based on bootstrap resampling of the training 
data. 

1.1.2 Anomaly detection While anomaly detec- 
tion has been extensively studied in literature [6] , tradi- 
tional anomaly detection algorithms (1-SVM [20], fc-NN 
based methods: iOrca [2], GEM [12], BP-kNNG [24], 
and density based methods: LOF and iForest [16]) do 
not address the following problem: For multivariate 
data of the form (x,y) : x £ R d ,y £ M, given nomi- 
nal training data and a test point (xo,yo), traditional 
anomaly detection methods are capable of checking ei- 
ther if the multivariate test point (xo, yo) is anomalous, 
or if the marginal test point y o is anomalous. How- 
ever, in a scenario where the last dimension y is nom- 
inally an unknown function of the input x, traditional 
anomaly detection algorithms are incapable of checking 
if the marginal test point yo is nominal or anomalous 
given the input xq. 

1.2 Contribution In this paper, an algorithm for 
determining prediction intervals for outputs of general 
non-parametric regression models is proposed using 
bootstrap methods. In contrast to the work done in [22, 
18, 19, 26, 13], where the underlying model is known to 
be linear, our algorithm does not make any assumptions 
about the underlying data model or noise distribution. 
Subsequently, using the prediction intervals, a novel 
anomaly detection algorithm is proposed to test if the 
observed output of a test point is an anomaly or not, 



conditioned on the observed input. 

The rest of this paper is organized as follows. In 
Section 2, the problem is formally stated. In Section 
3, the non-parametric regression models are described 
in detail. The bootstrap procedure for determining 
prediction intervals is described, and their validity is 
proved theoretically and via simulations in Section 4. 
The proposed theory is applied to the problem of 
anomaly detection in Section 5. The proposed anomaly 
detection algorithm is applied to determine aircraft in 
a fleet which are consuming abnormally large amounts 
of fuel. Finally, conclusions are given in Section 6. 

2 Preliminaries 

Throughout this paper, we assume the following model: 

(2.1) y(x) =ip(x) + e(x), 

where ip(x) : R d — » R is some unknown, but de- 
terministic, continuous, (p, C) smooth 1 function, and 
e(x) : R d -> R is an uniform random field used to 
model noise. In particular, for any x\ assume 
that e(xi), ..,e(xt) are iid, 0-mean and have finite vari- 
ance <j 2 < oo. 

Assume that the observed data pairs (x, y) are 
drawn from some underlying joint density f(x,y). In 

particular, let R = {(x^y*),* = 1,2,..} ~ f(x,y ) 
denote a stream of observed training data. Also let R(r) 
denote the first r samples in R: R(r) = {(xj,yj),« = 
l,..,r}. 

2.1 Notation Denote the cumulative distribution 
function corresponding to f{x,y) by F(x,y). For some 
fixed set of realizations R(r), let F R ( r )(x, j/) be the cor- 
responding empirical distribution function [28]. Let E 
be the expectation wrt the underlying probability dis- 
tribution F(x, y). 

Also, we will denote random variables and vectors 
in bold text, and indicate the weak convergence of 
sequences of random variables or vectors by =>. Denote 
the /oo distance between two distributions F(.),G(.) by 

doo(F, G) = sup \F(t) — G(t)|. 
ten 

Finally, for any given training data set Z = {(xj, ?/,), i = 
1, ..,n}, denote the regression model estimated using Z 

by yz{-)- 

2.2 Problem statement Assume that r iid training 
realizations R(r) are available for estimating the model 

1 Please refer to Definition 1 [14] for details about (p, C) smooth 

functions 


y r (x) = t/R( r ) (x) using non-parametric regression meth- 
ods. Given this training data, the goal of this paper is 
to provide a prediction interval for a future observation 
yo = y(xo) corresponding to an input xq- In particular, 
our goal is to determine prediction intervals at level a: 

Ice,r(*£ o) — (la,r (*£()) , Ua,r (^0)) } 
using the input R(r) such that the probability 

Pr{y(x 0 ) G I a ,r(zo)} « 1 - ol. 

The last statement will be made precise in section 3.3. 
3 Regression models 

In this section, we discuss the estimation of the function 
ip(.) using non-parametric regression methods, given the 
set of training realizations R(r). Any of the popular 
non-parametric regression techniques including nearest 
neighbor regression [8], kernel regression [17] , local poly- 
nomial regression [10], partitioning regression [11], sup- 
port vector regression [1] , artificial neural networks [23] 
and decision tress [5] can be used to determine the re- 
gression model. However, we require that the regres- 
sion model satisfies some regularity conditions which are 
listed below. 

3.1 Regularity assumptions on regression 
model The model y r (x) is assumed to be a determin- 
istic, continuous function of the training data R(r). 
The model yy(x) is also assumed to converge to a limit 
denoted by y(x) as r — > oo. Finally, the MSE rate 

e r (x 0 ) = E[yy(x) - V’(^)] 2 
is assumed to decay to 0. In particular, let 
(3.2) e r (x 0 ) = 0(r~ 2j ), 

for some 7 > 0. 

We note that all the non-parametric regression 
models listed in the previous section satisfy these as- 
sumptions. For details, please refer to [14] for nearest 
neighbor, polynomial and kernel regression methods, [7] 
for support vector regression, [15] for artificial neural 
networks, and [21] for decision trees. 

3.1.1 Optimal rates Stone [27] showed that for 
(p, C) smooth function ip(-), the optimal minimax rate 
of convergence that can be obtained by non-parametric 
regression estimates is given by r~ 2p ^ 2p+d \ Con- 
sequently, this implies that 7 is bounded above by 
p/(2p + d) and therefore by 1/2. 



3.2 Bootstrapping Resample the training data 
R(r) a total of t times, each time drawing a set of r 
realizations with replacement. Denote these t sets of r 
realizations by Bj(r) = {xji, Xj r }, i = 1, t. For each 
set of bootstrap realizations B, (r),* = 1, ,.,t, determine 
the regression fit yi, r (x) = Yb i(r)( x )- Fix t = r 7 . 

3.3 Formal problem statement In the next sec- 
tion, we will determine prediction interval I a , r (xo) such 
that 

lim —=L==[Pr{ y(x 0 ) G I a , r (x 0 )} - (1 - a)] = 0. 
r^oo ^/ioglOgr 

In the sequel, denote the function a r (j) = y4og log r/r 1 . 

4 Prediction intervals 

The output y(xo) at input Xo is forecast using y r (xo)- 
The prediction interval for yo = y(xo) is determined as 
follows. The future observation yo can be written as 

yo=y(a:o) = ip(x 0 ) + e(x 0 ) 

= y r (xo) + tp(x 0 ) - yy(x 0 ) + e(x 0 ) 
(4.3) = y r (x 0 ) + r] r (xo) + e(x 0 ), 

where 

rjr(xo) = ip(x 0 ) - y r (x 0 ), 

is the error in the model. To determine the prediction 
interval for yo, we first analyze the distribution of 
Yo — yy(xo) = Vr{x o) + e(xo). Denote the distribution 
of e(xo) by H e (.) and the distribution of f] r (x o) by 
Hr] r (xo) (■)■ 

Now, note that rj r (x o) and e(xo) are independent [4], 
The error in the prediction centered around yy(xo) 
therefore has contribution from the following two in- 
dependent components: (i) the error due to the model: 
r] r (x), and (ii) the error due to the observation noise: 
e(x). 

4.1 Error distribution In this section, we are going 
to investigate the computation of the distributions of 
rj r (x) and e(x). 


In other words, the bootstrap based samples yi, r (xo) 
approximate the distribution of yy(xo) up to 0(a r (O.5)), 
and can therefore be used to compute quantiles of G r (.). 

Let the mean of the distribution G r (.) be Hr and the 
mean of G(.) be p. Then, by (3.2), \/i r — p = 0(r -7 ). 
Next, let 


fir(x o) = 


ZZif^o) 


and observe that fi r (x o) is an estimate of the mean of 
G r . By lemma A. 3, the error |/t r (xo) — Hr I is of order 
O(a r (0.5)) with exponentially high probability. 

Denote the centered samples mi = yi jr (xo) — 
jl r (xo), and the empirical distribution of these samples 
by H r {.). Then, H r {.) will converge weakly to the 
distribution of r] r (xo). In particular, the l a c distance 
between the two distributions, H r ) is of 

order O(a r (0.5)) + 


4.1.2 Observation error Denote the differences 
°i = Ui(xi) — yj(xj). By the bias observation, we ob- 
serve that these iid random variables o j will converge 
in distribution to the true distribution of the noise e(x), 
and the loo distance between the two distributions is 
C>(r- 7 ). 


4.1.3 Distribution of total error From the pre- 
vious two sections, we see that m i,i = 1,2 ,..,t and 
o i,i = l, 2 ,..,r correspond to samples which approxi- 
mate r] r (x o) and e(xo) with error up to 0(a r ( 0.5)) + 
C>(r- 7 ). 

By the independence of m»,i = l,2,..,m and 
o i,i = 1,2 ,..,r, and lemma A. 2 it follows that the 
set C = {tfc = mj-|-Oj,i = l,2,..,m;j = l,2..,r} 
corresponds to samples of the distribution ? 7 r (xo)+e(xo) 
with errors up to 0(a r (O.5)) + 0(r -7 ). 


4.2 Prediction interval The prediction interval 
Ia,n{x o) can then be specified as follows. Let C a , 
Ci-a /2 denote the a/2, 1 — a/2 quantiles of the set 
C respectively. Then, define the prediction interval as 

Ia,r(^o) = y r(x o) T ( C a , 


4.1.1 Model error Realizations of the model error 
r/ r (x o) can be obtained via the bootstrapping approach 
as follows. Let the true distribution of the oracle output 
y(xo) be denoted by G(.) and the distribution of yy(xo) 
be denoted by G r (.). Also denote the distribution of the 
bootstrap iid samples y i, r {x) by G r (.). 

A well known result of the empirical distribution 
function [28] is that doo(F, F R ^) = 0(a r ( 0.5)). Then, 
by direct application of Lemma A.l, 

sup |G r (x) — G r (x)| = G(a r (0.5)). 


The procedure for computing this prediction interval 
is stated in the form of algorithm 1. The theoretical 
and experimental validity of the prediction interval 
are established via theorem 4.1 and in section 4.3 
respectively. 

Theorem 4.1. The prediction interval I a ,r{x o) is an 
asymptotic 1 — a prediction interval in the following 
sense: 

lim Pr{ y(xp) G Iq.rfoo)} - (1 - a) = 0 
r — >oo a r ( 7 ) 



Proof. Let the distribution of the iid realizations {t^} 
be given by T s>r , and the distribution of rj r {xo) + e(xo) 
be given by T a<r . Then, 

d oo (f s ^f Str )=O(a r (0.5))+O(r-^). 

Then, the quantiles C a , C\_ a / 2 converge to the true 
quantiles of T s , r at rate 0(a r (O.5)) + 0{r~ 7 ). This 
implies that Pr{y r (x 0 ) + e(x 0 ) € (C a , Ci_ a / 2 )} = 
1 — a + 0(a r (O.5)) + 0(r~ 7 ). Finally, observing that 
0(a r (O.5)) + 0{r ~ 7 ) = o(a r ('y)) concludes the proof. 


Algorithm 1 Prediction intervals for regression models 
1: procedure PREDICTlNTERVAL(J?(r), a, Xq) 

2: Build regression model y r 

3: Initialize error sample set E = <f> 

4: for each training sample (x i,yf) do 

5: Compute error Oi = yi(xi) — y r (xi) 

6: E — >ELi{oi} 

7: end for 

8: Build t bootstrap samples Bi from R(r ) 

9: Initialize bootstrap sample set D = (f> 

10: for each bootstrap sample Bi do 

11: Build regression models yi, r {x) 

12 : Obtain centered samples m; 

13: D —> D U {m;} 

14: end for 

15: Build the set C by convolving D and E 

16: Obtain C a , C 1 _ a / 2 : 

17: the a/2, 1 — a/2 quantiles of the set C 

18: Set lot, rip 0 ) — yr('l'o) T {C a , C\^ a j 2 } 

19: Return I a , r (x 0 ) 

20: end procedure 


4.3 Simulations In this section, the proposed boot- 
strap based prediction intervals are validated through a 
series of Monte-Carlo experiments. 

4.3.1 Simple linear model In the first experiment, 
we generate data drawn from a simple linear model 

y = 3x + 5 + e, 

where x is drawn uniformly between [0, 1] and e is 
drawn iid from N(0, a 2 ) with cr = 0.1. The size of 
the training set is r = 100. For doing regression, we 
use the artificial neural network regression method and 
then compute prediction intervals based on bootstrap 
samples as described in algorithm 1. Finally, we 
contrast the proposed prediction intervals with the 
classical parametric prediction intervals [22] which have 
been derived for linear regression. 



(a) 



(b) 


Figure 1: Comparison of parametric 95% prediction 
intervals based on linear regression and and bootstrap 
based 95% prediction intervals based on artificial neural 
networks on data generated via a linear model, (b) is a 
zoomed version of (a). From the figures, it is clear that 
there is excellent agreement between the paramteric and 
bootstrap based prediction intervals. 

In the classical model for simple linear regression, 
under the assumption that the noise is distributed iid 
7V"(0, fT 2 ), the appropriate prediction interval [22] for a 
future value yo, given explanatory level xq, is 


Ja,n(x 0 ) i)(x 0 ) ^ ta/2& \ 1 E T y-m , - \2 : 

V n 2_vi:=iv a ' — x i) 

where a is the estimate of the noise. 

We evaluate and plot the true response, noisy ob- 
servations and 95 % prediction intervals corresponding 
to the parametric linear regression and non-parametric 
neural network regression models along s = 100 loca- 
tions placed uniformly in the [0,1] interval in Fig. 1. 
From Fig. 1, it is clear that there is excellent agree- 
ment between the parametric prediction intervals based 
on linear regression and and bootstrap based prediction 


intervals based on artificial neural networks. The pre- 
diction intervals based on the bootstrap are marginally 
wider, accounting for the slower 0 (r -7 ) rate of conver- 
gence of the non-parametric neural network regression 
method in comparison to the linear regression method, 
which enjoys a parametric 0 (r -1 / 2 ) rate of convergence. 

4.3.2 Simple polynomial model To illustrate the 
non-parametric advantage that our bootstrap prediction 
interval algorithm enjoys, we repeat the previous exper- 
iment, but with one difference: we generate the training 
data according to the model: 

y = 3a ; 2 + 5 + e. 

The results are shown in Fig. 2. From Fig. 2, it is clear 
that the parametric prediction interval is inaccurate due 
to mis-specification of the model. On the other hand, 
the non-parametric bootstrap based prediction interval 
is able to accurately determine the correct prediction 
interval. 



(a) 


4.3.3 Multivariate example In the next experi- 
ment, we consider the following non-linear model 

y = exp(xi) +x 2 x\ + log(|x 4 + x 5 |) + e, 

with x = (xi, .., X 5 ) drawn from a multivariate random 
variable with mean vector m = [0.1, 0.2, 0, 0.05, 1.2] and 
covariance matrix 
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We draw a total of r = 1000 training samples, and 
determine the prediction intervals corresponding to 
a = {.005, .01, .02, .05, .1, .2} at s = 100 uniformly 
placed points in the [0, l ] 5 grid. Subsequently, we 
compute the experimentally observed coverage width of 
the prediction intervals for each of the desired coverage 
widths a £ {.005, .01, .02, .05, .1, .2}. The results are 
shown in table 1. From table 1, it is clear that there is 
excellent agreement between the two sets of values. 

For visual illustration, we plot the true response, 
the noisy test samples, and the estimated prediction 
intervals as a function of the sorted index of the true 
response in Fig. 3. From the figure, it is clear that 
the estimated prediction intervals accurately capture 
the variation in the observed test data. In particular, 
we note that the prediction intervals are thinner at 
locations where the variation in y is smaller and vice 
versa. This is in complete agreement with our intuition. 



Figure 2: Comparison of parametric 95% prediction 
intervals based on linear regression and and bootstrap 
based 95% prediction intervals based on artificial neural 
networks on data generated via a quadratic model, (b) 
is a zoomed version of (a). From the figures, it is clear 
that the parametric prediction interval is inaccurate due 
to mis-specification of the model. On the other hand, 
the non-parametric bootstrap based prediction interval 
is able to accurately determine the correct prediction 
interval. 


Desired and observed coverage 

Desired 

.20 

.10 

.05 

.02 

.01 

.005 

Observed 

.198 

.112 

.042 

.019 

.012 

.006 


Table 1: Desired and observed coverage of the proposed 
prediction intervals. There is excellent agreement be- 
tween the two sets of values. 
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Figure 3: Prediction intervals for non-linear model. The 
estimated prediction intervals accurately capture the 
variation in the observed test data. 


5 Application to anomaly detection 

Prediction intervals can be used to determine anomalies 
in the test data in the following manner. Given the 
training data R(r), a test point (xo,yo), and a desired 
false alarm rate a , we construct the prediction interval 
I a ,r(^o) using Algorithm 1. We then declare (xo,yo) to 
be an anomaly at false alarm rate a if yo ^ I a ,r(^o) 
and nominal otherwise. This Conditional Anomaly 
Detection (CAD) scheme for testing a set of anomalies 
S(s) = (x i,yi),i = 1,..,8 is formally described in 
Algorithm 2. 


Algorithm 2 Conditional Anomaly Detection 
1: procedure DETECTANOMALY(R(r), S(s), a) 

2 : Initialize anomaly set A = <fi 

3: for each test sample (xj,t/j) in test set S(s) do 

4: Build prediction interval: I a , r (xi)'. 

5: I a ,r(xi) = PredictInterval(R(r) , a, Xi) 

6: if (y, £ l a ,r(xi)) then 

7: A — ► AU (xi,yi) 

8 : end if 

9: end for 

10: Return A 

11: end procedure 


5.1 Comparison to existing methods The pro- 
posed CAD algorithm differs from popular anomaly de- 
tection algorithms (1-SVM [20], fc-NN based methods: 
iOrca, GEM, BP-kNNG, and density based methods: 
LOF and iForest) in the following respect. Traditional 
anomaly detection methods check for anomalies by test- 
ing for the following null and alternate hypothesis [12]: 

HO : {x 0 ,y 0 ) ~ f(x,y) versus HI : (x 0 ,y 0 ) ^ f(x,y). 

If the data of interest is only the y variable, the anomaly 
detection algorithms could be used to test between the 
hypothesis: 


HO : y 0 ~ f(y) versus HI : y 0 ^ f(y). 

In contrast, the proposed anomaly detection 
method is checking for anomalies by testing between: 

H0:y o ~ f{y\x 0 ) versus HI : y 0 ^ f(y\x 0 ). 

In other words, the traditional algorithms test for 
anomalies wrt the joint distribution f(x,y) or the 
marginal distribution f(y) of y , whereas the proposed 
algorithm tests for anomalies wrt the conditional distri- 
bution f(y\x). 

5.2 Aviation fuel study The environmental impact 
of aviation is enormous given the fact that in the US 
alone there are nearly 6 million flights per year of 
commercial aircraft. Detecting aircraft in a fleet which 
consume excess fuel is therefore extremely important. 
The proposed CAD algorithm can be used to determine, 
on a on a second by second basis, the nominal fuel 
consumption interval as a function of the measured 
state - velocity, acceleration etc - of the aircraft. The 
prediction intervals can then be used to determine 
if the instantaneous fuel consumption of the flight is 
anomalous. 

This fuel study differs from the current state-of-the- 
art used by airline companies, which involves simply 
comparing the actual fuel consumption against averages 
for a given flight or aircraft. Such computations do 
not sufficiently control for the context of the flight 
and therefore may not reveal more subtle performance 
issues. 

5.2.1 Data set The data set used in our study 
is known as Flight Operational Quality Assurance 
(FOQA), which is used for numerous purposes includ- 
ing improving safety and efficiency of the operations of 
commercial and business transport aircraft. In addition 
to the actual fuel consumption, FOQA data includes pa- 
rameters such as velocity, acceleration, pitch rate, pay- 
load etc. This data is measured and monitored on every 


aircraft on a second-by-second basis. The full list of pa- 
rameters can be found in Table 1 [25]. 

In this study, we use FOQA data from an airline 
company corresponding to the one year period between 
May 2010 - April 2011. This data set has FOQA 
information of about 60,000 flights, corresponding to 
330 distinct aircraft. The first 6 months of this data 
is used for training, and the next 6 months are used as 
the test data set S(s). The full description of these data 
sets can be found in [25]. 

5.2.2 Application of CAD The FOQA parameters 
(suitably normalized) are treated as inputs, and the 
observed fuel consumption rate values are treated as 
outputs. This training data is fed as input to CAD, 
and is used to check for anomalies in the later half 
of the year. The results obtained for each time point 
are aggregated on a per-aircraft basis. The results 
corresponding to the sorted top k = 5 aircraft wrt % 
anomaly time are shown in table 2. From the table, it is 
clear that only two aircraft are detected to be consuming 
excess fuel. Furthermore, these anomalies aircraft were 
not detected when using a traditional anomaly detection 
method like iOrca. 


ture observation is specified. The procedure is based on 
bootstrap techniques and does not require any under- 
lying assumptions about the data or noise model. The 
validity of the bootstrap based prediction intervals is 
shown in theory to hold asymptotically. Subsequently, 
the validity of the intervals is proved via monte-carlo 
experiments. 

Next, an anomaly detection algorithm based on pre- 
diction intervals is proposed. The anomaly detection 
algorithm differs from popular anomaly detection algo- 
rithms in that it discovers anomalies wrt conditional 
probability distributions. The anomaly detection algo- 
rithm is applied to discover aircraft in fleet which con- 
sume excess fuel. 


Aircraft* 

# of flights 

% anomaly time 

147 

68 

32.43% 

641 

111 

14.51% 

111 

104 

0.85% 

976 

45 

0.37% 

342 

33 

0.37% 





Table 2: List of aircraft (* = anonymized) and the % 
anomaly time as determined by CAD. Only two aircraft 
in the fleet of 330 aircraft were found to be anomalous. 

5.2.3 Results On further investigation, it was found 
that the longitudinal acceleration sensors were faulty 
in both aircraft. In both cases, the sensor measure- 
ments were lower than the true acceleration of the air- 
craft. This resulted in prediction intervals which were 
shifted down, and subsequently resulted in the true fuel 
consumption falling outside the prediction intervals and 
leading to these aircraft being classified as anomalous. 
This has since been officially confirmed by the airline 
company, and the faulty FOQA sensors in these aircraft 
have since been replaced. 

6 Conclusions 

In this paper, a procedure for determining prediction 
intervals in non-parametric regression models for a fu- 



A General results 


To prove the next part of the statement, consider: 


Consider a probability space (fl,B,P). Let X& : O — > 
R, k = 1,2,.. be a sequence of random variables 
with corresponding distribution functions F k (.). Also, 
assume that this sequence converges in distribution to a 
random variable X : fl — > R with distribution function 
F(.). In particular, let 

d 00 (F k ,F) = 0(k~f } ), 

for some (3 > 0. 

Identically define a sequence of random variables 
Y k : O — » R, k = 1,2,.. with distribution functions 
Gfc(.) and a weak convergence limit to random variable 
Y with distribution G. Similarly, let 

doo(G fe ,G) = 0(k~'i), 


Pr{h r (X k )<a} = Pr{X k (r) G H r (a)} 

(since d oc (F k ,F) = Q(k~ p )) 

= Pr{X(r) G H r (a)} + O(k- 0 ) 

= Pr{h r (X(r)) < a} + 0(k~ 0 ). 

Observing that this is true for any a G R concludes the 
proof. 

Lemma A. 2. The sequence of random variables X k + 
Y k converge weakly to X + Y. Furthermore, ifX k ,Y k 
are independent for every k, the distribution H k (.) of 
X k +Y k converges to the distribution H (.) ofX + Y at 
the rate: 

d 00 (H k ,H)=0(k~ iain 


for some 7 > 0. 

Construct sequences of random variables corre- 
sponding to F k (.), F(.) as follows. Let X k = 
{Xfei,Xfc2,..} where the elements of X k are drawn iid 
from F k (.). Define X = {Xi, X2, in an identical man- 
ner with elements drawn from F(.). Also let X k (r) = 
{Xfci,X fc2 , -,x kr } and X(r) = {Xi,X 2 , ..,X r } be sub- 
sets of size r. Let X = {xi, X 2 , •••} be any fixed sequence 
of real numbers, and let X(r) = {xi,..,x r } denote a 
subset of size r. 

Finally, define a sequence of functions hi : R* — > 
R, i = 1,2,... Assume that this sequence of function 
have a limit in the following sense: for any e > 0 and 
any sequence X , there exists ?’o(e) G N such that for all 
r > r 0 , 


\h r +i{X(r + 1)) - h r (X(r))\ < e. 


Proof. By the Cramer- Wold device [3] , it is trivial to see 
that the joint pairs (X k ,Y k ) converge weakly to (X, Y). 
The result follows by invoking the continuous mapping 
theorem [3]. 


Pr{X k + Y k <a} 



ai)dG k (ai) 



ai)dG(ai) 


+G(/c“ /3 ) + G(fc" T ) 



ai)dG(ai) 


+o(r minA7 ) 

Pr{X + Y < a} 

+o(r min{ft7} ). 


Define the limit of this sequence by h : R°° — » R = 
limr^oo h r (X(r)). Also, define the sets Hi{a ) = {X(i) : 
hi{X(i)) < a} for some aGl. Now, we will prove the 
lemma: 

Lemma A.l. For any r G N, h r (X k ) converges weakly 
to h r (X). Furthermore, the error rate 

sup | Pr{h r {X k ) < a} - Pr{h r (X) < a}\ = 0{k~ 0 ). 

Ot 


Lemma A. 3. Denote the sample mean of X(r) by fi r 
and the mean of F(.) by p. For any 7 < 1/2, denote 
the event 

E 1 = I{ \p r - p\ > log log r/r 7 } . 

Then, 

Pr{Ej} < exp(— 

Proof. The statement follows by direct application of 
the Clrernoff bound. 


Proof. By the Cramer- Wold device [3] and the fact 
that the distributions F k (.) converge to F(.), it clearly 
follows that X k (r) converges weakly to X(r) for all 
r G R. By the continuous mapping theorem, this in 
turn implies that h r (X k ) converges weakly to h r (X). 
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